[comment {-*- tcl -*- doctools manpage}]
[manpage_begin grammar::fa::op n 0.4]
[keywords automaton]
[keywords {finite automaton}]
[keywords grammar]
[keywords parsing]
[keywords {regular expression}]
[keywords {regular grammar}]
[keywords {regular languages}]
[keywords state]
[keywords transducer]
[copyright {2004-2008 Andreas Kupries <andreas_kupries@users.sourceforge.net>}]
[moddesc   {Finite automaton operations and usage}]
[titledesc {Operations on finite automatons}]
[category  {Grammars and finite automata}]
[require Tcl 8.4]
[require snit]
[require struct::list]
[require struct::set]
[require grammar::fa::op [opt 0.4.1]]
[description]
[para]

This package provides a number of complex operations on finite
automatons (Short: FA),

as provided by the package [package grammar::fa].

The package does not provide the ability to create and/or manipulate
such FAs, nor the ability to execute a FA for a stream of symbols.

Use the packages [package grammar::fa]
and [package grammar::fa::interpreter] for that.

Another package related to this is [package grammar::fa::compiler]
which turns a FA into an executor class which has the definition of
the FA hardwired into it.

[para]

For more information about what a finite automaton is see section
[emph {FINITE AUTOMATONS}] in package [package grammar::fa].

[section API]

The package exports the API described here.  All commands modify their
first argument. I.e. whatever FA they compute is stored back into
it. Some of the operations will construct an automaton whose states
are all new, but related to the states in the source
automaton(s). These operations take variable names as optional
arguments where they will store mappings which describe the
relationship(s).

The operations can be loosely partitioned into structural and language
operations. The latter are defined in terms of the language the
automaton(s) accept, whereas the former are defined in terms of the
structural properties of the involved automaton(s). Some operations
are both.

[emph {Structure operations}]

[list_begin definitions]

[call [cmd ::grammar::fa::op::constructor] [arg cmd]]

This command has to be called by the user of the package before any other
operations is performed, to establish a command which can be used to
construct a FA container object. If this is not done several operations
will fail as they are unable to construct internal and transient containers
to hold state and/or partial results.

[para]

Any container class using this package for complex operations should set
its own class command as the constructor. See package [package grammar::fa]
for an example.

[call [cmd ::grammar::fa::op::reverse] [arg fa]]

Reverses the [arg fa]. This is done by reversing the direction of all
transitions and swapping the sets of [term start] and [term final]
states. The language of [arg fa] changes unpredictably.

[call [cmd ::grammar::fa::op::complete] [arg fa] [opt [arg sink]]]

Completes the [arg fa] [term complete], but nothing is done if the
[arg fa] is already [term complete]. This implies that only the first
in a series of multiple consecutive complete operations on [arg fa]
will perform anything. The remainder will be null operations.

[para]

The language of [arg fa] is unchanged by this operation.

[para]

This is done by adding a single new state, the [term sink], and
transitions from all other states to that sink for all symbols they
have no transitions for. The sink itself is made complete by adding
loop transitions for all symbols.

[para]

Note: When a FA has epsilon-transitions transitions over a symbol for
a state S can be indirect, i.e. not attached directly to S, but to a
state in the epsilon-closure of S. The symbols for such indirect
transitions count when computing completeness of a state. In other
words, these indirectly reached symbols are [emph not] missing.

[para]

The argument [arg sink] provides the name for the new state and most
not be present in the [arg fa] if specified. If the name is not
specified the command will name the state "sink[var n]", where [var n]
is set so that there are no collisions with existing states.

[para]

Note that the sink state is [term {not useful}] by definition.  In
other words, while the FA becomes complete, it is also
[term {not useful}] in the strict sense as it has a state from which
no final state can be reached.

[call [cmd ::grammar::fa::op::remove_eps] [arg fa]]

Removes all epsilon-transitions from the [arg fa] in such a manner the
the language of [arg fa] is unchanged. However nothing is done if the
[arg fa] is already [term epsilon-free].

This implies that only the first in a series of multiple consecutive
complete operations on [arg fa] will perform anything. The remainder
will be null operations.

[para]

[emph Note:] This operation may cause states to become unreachable or
not useful. These states are not removed by this operation.

Use [cmd ::grammar::fa::op::trim] for that instead.

[call [cmd ::grammar::fa::op::trim] [arg fa] [opt [arg what]]]

Removes unwanted baggage from [arg fa].

The legal values for [arg what] are listed below. The command defaults
to [const !reachable|!useful] if no specific argument was given.

[list_begin definitions]
[def [const !reachable]]
Removes all states which are not reachable from a start state.

[def [const !useful]]
Removes all states which are unable to reach a final state.

[def [const !reachable&!useful]]
[def [const !(reachable|useful)]]
Removes all states which are not reachable from a start state and are
unable to reach a final state.

[def [const !reachable|!useful]]
[def [const !(reachable&useful)]]
Removes all states which are not reachable from a start state or are
unable to reach a final state.

[list_end]
[para]

[call [cmd ::grammar::fa::op::determinize] [arg fa] [opt [arg mapvar]]]

Makes the [arg fa] deterministic without changing the language
accepted by the [arg fa]. However nothing is done if the [arg fa] is
already [term deterministic]. This implies that only the first in a
series of multiple consecutive complete operations on [arg fa] will
perform anything. The remainder will be null operations.

[para]

The command will store a dictionary describing the relationship
between the new states of the resulting dfa and the states of the
input nfa in [arg mapvar], if it has been specified. Keys of the
dictionary are the handles for the states of the resulting dfa, values
are sets of states from the input nfa.

[para]

[emph Note]: An empty dictionary signals that the command was able to
make the [arg fa] deterministic without performing a full subset
construction, just by removing states and shuffling transitions around
(As part of making the FA epsilon-free).

[para]

[emph Note]: The algorithm fails to make the FA deterministic in the
technical sense if the FA has no start state(s), because determinism
requires the FA to have exactly one start states.

In that situation we make a best effort; and the missing start state
will be the only condition preventing the generated result from being
[term deterministic].

It should also be noted that in this case the possibilities for
trimming states from the FA are also severely reduced as we cannot
declare states unreachable.

[call [cmd ::grammar::fa::op::minimize] [arg fa] [opt [arg mapvar]]]

Creates a FA which accepts the same language as [arg fa], but has a
minimal number of states. Uses Brzozowski's method to accomplish this.

[para]

The command will store a dictionary describing the relationship
between the new states of the resulting minimal fa and the states of
the input fa in [arg mapvar], if it has been specified. Keys of the
dictionary are the handles for the states of the resulting minimal fa,
values are sets of states from the input fa.

[para]

[emph Note]: An empty dictionary signals that the command was able to
minimize the [arg fa] without having to compute new states. This
should happen if and only if the input FA was already minimal.

[para]

[emph Note]: If the algorithm has no start or final states to work
with then the result might be technically minimal, but have a very
unexpected structure.

It should also be noted that in this case the possibilities for
trimming states from the FA are also severely reduced as we cannot
declare states unreachable.

[list_end]

[emph {Language operations}]

All operations in this section require that all input FAs have at
least one start and at least one final state. Otherwise the language of
the FAs will not be defined, making the operation senseless (as it
operates on the languages of the FAs in a defined manner).

[list_begin definitions]

[call [cmd ::grammar::fa::op::complement] [arg fa]]

Complements [arg fa]. This is possible if and only if [arg fa] is
[term complete] and [term deterministic]. The resulting FA accepts the
complementary language of [arg fa]. In other words, all inputs not
accepted by the input are accepted by the result, and vice versa.

[para]

The result will have all states and transitions of the input, and
different final states.

[call [cmd ::grammar::fa::op::kleene] [arg fa]]

Applies Kleene's closure to [arg fa].

The resulting FA accepts all strings [var S] for which we can find a
natural number [var n] (0 inclusive) and strings [var A1] ... [var An]
in the language of [arg fa] such that [var S] is the concatenation of
[var A1] ... [var An].

In other words, the language of the result is the infinite union over
finite length concatenations over the language of [arg fa].

[para]

The result will have all states and transitions of the input, and new
start and final states.

[call [cmd ::grammar::fa::op::optional] [arg fa]]

Makes the [arg fa] optional. In other words it computes the FA which
accepts the language of [arg fa] and the empty the word (epsilon) as
well.

[para]

The result will have all states and transitions of the input, and new
start and final states.

[call [cmd ::grammar::fa::op::union] [arg fa] [arg fb] [opt [arg mapvar]]]

Combines the FAs [arg fa] and [arg fb] such that the resulting FA
accepts the union of the languages of the two FAs.

[para]

The result will have all states and transitions of the two input FAs,
and new start and final states. All states of [arg fb] which exist in
[arg fa] as well will be renamed, and the [arg mapvar] will contain a
mapping from the old states of [arg fb] to the new ones, if present.

[para]

It should be noted that the result will be non-deterministic, even if
the inputs are deterministic.

[call [cmd ::grammar::fa::op::intersect] [arg fa] [arg fb] [opt [arg mapvar]]]

Combines the FAs [arg fa] and [arg fb] such that the resulting FA
accepts the intersection of the languages of the two FAs. In other
words, the result will accept a word if and only if the word is
accepted by both [arg fa] and [arg fb]. The result will be useful, but
not necessarily deterministic or minimal.

[para]

The command will store a dictionary describing the relationship
between the new states of the resulting fa and the pairs of states of
the input FAs in [arg mapvar], if it has been specified. Keys of the
dictionary are the handles for the states of the resulting fa, values
are pairs of states from the input FAs. Pairs are represented by
lists. The first element in each pair will be a state in [arg fa], the
second element will be drawn from [arg fb].

[call [cmd ::grammar::fa::op::difference] [arg fa] [arg fb] [opt [arg mapvar]]]

Combines the FAs [arg fa] and [arg fb] such that the resulting FA
accepts the difference of the languages of the two FAs. In other
words, the result will accept a word if and only if the word is
accepted by [arg fa], but not by [arg fb]. This can also be expressed
as the intersection of [arg fa] with the complement of [arg fb]. The
result will be useful, but not necessarily deterministic or minimal.

[para]

The command will store a dictionary describing the relationship
between the new states of the resulting fa and the pairs of states of
the input FAs in [arg mapvar], if it has been specified. Keys of the
dictionary are the handles for the states of the resulting fa, values
are pairs of states from the input FAs. Pairs are represented by
lists. The first element in each pair will be a state in [arg fa], the
second element will be drawn from [arg fb].

[call [cmd ::grammar::fa::op::concatenate] [arg fa] [arg fb] [opt [arg mapvar]]]

Combines the FAs [arg fa] and [arg fb] such that the resulting FA
accepts the cross-product of the languages of the two FAs. I.e. a word
W will be accepted by the result if there are two words A and B
accepted by [arg fa], and [arg fb] resp. and W is the concatenation of
A and B.

[para]

The result FA will be non-deterministic.

[call [cmd ::grammar::fa::op::fromRegex] [arg fa] [arg regex] [opt [arg over]]]

Generates a non-deterministic FA which accepts the same language as
the regular expression [arg regex]. If the [arg over] is specified it
is treated as the set of symbols the regular expression and the
automaton are defined over. The command will compute the set from the
"S" constructors in [arg regex] when [arg over] was not
specified. This set is important if and only if the complement
operator "!" is used in [arg regex] as the complementary language of
an FA is quite different for different sets of symbols.

[para]

The regular expression is represented by a nested list, which forms
a syntax tree. The following structures are legal:

[list_begin definitions]

[def "{S x}"]

Atomic regular expression. Everything else is constructed from
these. Accepts the [const S]ymbol "x".

[def "{. A1 A2 ...}"]

Concatenation operator. Accepts the concatenation of the regular
expressions [var A1], [var A2], etc.

[para]

[emph Note] that this operator accepts zero or more arguments. With zero
arguments the represented language is [term epsilon], the empty word.

[def "{| A1 A2 ...}"]

Choice operator, also called "Alternative". Accepts all input accepted
by at least one of the regular expressions [var A1], [var A2], etc. In
other words, the union of [var A1], [var A2].

[para]

[emph Note] that this operator accepts zero or more arguments. With zero
arguments the represented language is the [term empty] language,
the language without words.

[def "{& A1 A2 ...}"]

Intersection operator, logical and. Accepts all input accepted which
is accepted by all of the regular expressions [var A1], [var A2],
etc. In other words, the intersection of [var A1], [var A2].

[def "{? A}"]

Optionality operator. Accepts the empty word and anything from the
regular expression [var A].

[def "{* A}"]

Kleene closure. Accepts the empty word and any finite concatenation of
words accepted by the regular expression [var A].

[def "{+ A}"]

Positive Kleene closure. Accepts any finite concatenation of words
accepted by the regular expression [var A], but not the empty word.

[def "{! A}"]

Complement operator. Accepts any word not accepted by the regular
expression [var A]. Note that the complement depends on the set of
symbol the result should run over. See the discussion of the argument
[arg over] before.

[list_end]

[call [cmd ::grammar::fa::op::toRegexp] [arg fa]]

This command generates and returns a regular expression which accepts
the same language as the finite automaton [arg fa]. The regular
expression is in the format as described above, for
[cmd ::grammar::fa::op::fromRegex].

[call [cmd ::grammar::fa::op::toRegexp2] [arg fa]]

This command has the same functionality as [cmd ::grammar::fa::op::toRegexp],
but uses a different algorithm to simplify the generated regular expressions.

[call [cmd ::grammar::fa::op::toTclRegexp] [arg regexp] [arg symdict]]

This command generates and returns a regular expression in Tcl syntax for the
regular expression [arg regexp], if that is possible. [arg regexp] is in the
same format as expected by [cmd ::grammar::fa::op::fromRegex].

[para]

The command will fail and throw an error if [arg regexp] contains
complementation and intersection operations.

[para]

The argument [arg symdict] is a dictionary mapping symbol names to
pairs of [term {syntactic type}] and Tcl-regexp. If a symbol
occurring in the [arg regexp] is not listed in this dictionary then
single-character symbols are considered to designate themselves
whereas multiple-character symbols are considered to be a character
class name.

[call [cmd ::grammar::fa::op::simplifyRegexp] [arg regexp]]

This command simplifies a regular expression by applying the following
algorithm first to the main expression and then recursively to all
sub-expressions:

[list_begin enum]
[enum] Convert the expression into a finite automaton.
[enum] Minimize the automaton.
[enum] Convert the automaton back to a regular expression.
[enum] Choose the shorter of original expression and expression from
the previous step.
[list_end]

[list_end]

[para]

[section EXAMPLES]

[vset CATEGORY grammar_fa]
[include ../common-text/feedback.inc]
[manpage_end]
